An experiment in authorship attribution
نویسندگان
چکیده
This paper reports an experiment in authorship attribution that reveals considerable authorial structure in texts written by authors with very similar background and training, with genre and topic being strictly controlled for. We interpret our results as supporting the hypothesis that authors have ’textual fingerprints’, at least for texts produced by authors who are not consciously changing their style of writing across texts. What this study has also taught us is that discriminant analysis is a more appropriate technique to use than principal components analysis when predicting the authorship of an unknown (held-out) text on the basis of known (training) texts of which the authorial provenance is available. Finally, standard discriminant analysis can be enhanced considerably by using an entropy-based weighting scheme of the kind used in latent semantic analysis (Landauer et al., 1998).
منابع مشابه
Authorship Attribution Using Text Distortion
Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...
متن کاملDomain Independent Authorship Attribution without Domain Adaptation
Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...
متن کاملAn Extremely Simple Authorship Attribution System
In this paper we present a very simple yet effective algorithm for authorship attribution. By this term we mean the act of telling whether a certain text was or was not written by a certain author. We shall not discuss the advantages or applications of this activity, but we propose a method for doing it in an automatic and instantaneous way, neither considering the language of the texts nor und...
متن کاملQuestioned Electronic Documents : Empirical Studies in Authorship Attribution
Forensic analysis of questioned electronic documents is very difficult, because the nature of the documents eliminates many kinds of informative differences. Recent work in authorship attribution demonstrates the practicality of analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and ...
متن کاملN-gram-based Author Profiles for Authorship Attribution
We present a novel method for computer-assisted authorship attribution based on characterlevel n-gram author profiles, which is motivated by an almost-forgotten, pioneering method in 1976. The existing approaches to automated authorship attribution implicitly build author profiles as vectors of feature weights, as language models, or similar. Our approach is based on byte-level n-grams, it is l...
متن کاملAuthorship Attribution Using Word Network Features
In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends to show complex network structure at word level, with low degrees of separation and scale-free (power law) degree distribution. There has also been work on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002